Data Types & Structures in R

Md Zulquar Nain

AMU

January 26, 2025

Data Types

Objects

  • Most statistical software (e.g., SPSS, Stata) operates on datasets, which consist of rows of observations and columns of variables
  • R is an object-oriented programming language (like Python,JavaScript).
  • So, what is an object?
  • In computer language - complex definition
  • Anything to which you assigned a value

Important

Objects are like boxes in which we can put things: data, functions, and even other objects. Ben Skinner

Data Objects

  • Data usually obtained from external sources
  • Collected by researchers using survey
  • In R import those data files into R Objects

Data Types

  • Use function typeof() to check the type of data
  • the function typeof() returns the Data type or data structure
  • Data Types are
    • Double
    • Integer
    • Complex
    • Logical
    • Character
    • Factor
    • Date and time

Data Types

  • Double
    • Doubles are numbers like 2.0, 2.2, 2.999
    • May or may not include decimals
    • Mostly used to represent continuous variables like wage, height, age
    • In R by default all numbers are Double
x <- 10.5
is.double(x)
[1] TRUE

Data Types

  • Integer
    • Integers are natural numbers
    • Mostly used for counting variables
    • However, in R by default treated as Double
x <- 6
typeof(x)
[1] "double"
x <- as.integer(9)
typeof(x)
[1] "integer"

Data Types

  • Complex
    • Complex like \(x+yi\)
x <- -1
typeof(x)
[1] "double"
sqrt(x)
[1] NaN
# Complex
y <--1+(0+0i)
sqrt(y)
[1] 0+1i

Data Types

  • Logical
    • Logical: a variable of logical type data has values like TRUE or FALSE
    • Usually used to represent conditional statements
x <- 20
y <- 15
a <- x > y
a
[1] TRUE
typeof(a)
[1] "logical"

Data Types

  • Character
    • Characters represents a string values in R
    • Usually a string or collection of characters are kept indouble quotes
    • Anything in a double quote is considered a string in R
x <- "You are smart"
print(x)
[1] "You are smart"
y <- "5"
typeof(y)
[1] "character"

Data Types

  • Factor
    • Factor represents categorical data
    • For example gender(Male, Female), employed and unemployed
    • Also handy in case of panel and longitudinal data
    • Factor objects can be created from character or numeric objects
# Created from Characters
cat <- c("A","B","C")
# Use functions to convert into factor type
cat=as.factor(cat)
cat 
[1] A B C
Levels: A B C
levels(cat)
[1] "A" "B" "C"

Data Types

  • Date and Time
    • We need quite often especially when dealing with time series models
    • Function as.Date is used to create Date and Time objects
    • See help(as.Date)
today <-"25-03-2023" 
today <- as.Date(today,"%d-%m-%Y")
today
[1] "2023-03-25"
today1 <- Sys.Date()
today1
[1] "2025-01-26"
today2 <- format(today1, "%d %B %Y")
today2
[1] "26 January 2025"

Missing Values in R

  • In R missing data is represented by NA meaning Not Available
  • Another symbol appears in R is NaN meaning Not a Number
  • NULL in R represent an object with ZERO length
  • -Inf and Inf represents negative and positive infinity
# Create a vector using function `c()`
ownd <- c("10","30","missing")
# converting to double
ownd <- as.double(ownd)
  # check for the missing value
is.na(ownd)
[1] FALSE FALSE  TRUE

Data Structure in R

Data Structure in R

  • For analysis,data should be structured in well-defined manner.
  • In R there are certain structures followed by imported data.
    • Vector
    • Matrices
    • Lists
    • Array
    • Data Frame

Vectors

  • scalars Vs Vectors?
  • groups of values of same data types
  • Simplest function to define a vector c(value1,value2,...)
  • All the operators and functions discussed earlier can be used for vectors also
  • However, operations are performed element by element
# define a vector `vec` with function `c`
vec <- c(2, 4,-4,0)
vec
[1]  2  4 -4  0
print(vec)
[1]  2  4 -4  0

Vectors

  • Other way to generate vectors
Special functions to create vectors
Functions Explanations
numeric(n) vector with n zeros
rep(x,n) Vector with n equal elements of x
seq(x)/seq(1:x) Sequence from 1 to x
seq(f,x)/seq(f:x) Sequence from 1 to x
seq(f,x,s) Sequence from 1 to x in steps s

Basic Operations of a Vector

Functions Explanations
length(v) Number of elements in vector v
max(v) Largest Number in vector v
min(v) Smallest number in vector v
sum(v) Sum of the elements in vector v
prod(v) Product of elements in vector v
sort(v) Sorting of the elements of vector v

Basic Operations of a Vector

# define a vector `vec` with function `c`
vec <- c(2, 4,-4,0,1,3)
vec
[1]  2  4 -4  0  1  3
#length
length(vec)
[1] 6
# largest number
max(vec)
[1] 4
#smallest number
min(vec)
[1] -4
#sum of elements
sum(vec)
[1] 6
#product of elements
prod(vec)
[1] 0
# sorting
sort(vec)
[1] -4  0  1  2  3  4

Special Types of Vectors

  • Character Vectors
  • Logical Vectors
  • Factors
# Character Vectors
sname <- c("Raju","abdul","Soha")
sname
[1] "Raju"  "abdul" "Soha" 
# logical Vectors
a <- c(7, 2, 6, 10, 4, 1, 3)
a
[1]  7  2  6 10  4  1  3
b <- a<3 | a>=6
b
[1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE

Special Types of Vectors

  • Factors
  • Rate my teaching😂
  • Good=1, Very good=2, Best=3
  • ten students’ response are recorded in numbers
# ratings in numbers
qt <- c(3, 2, 1, 1, 2, 3, 2, 1, 3, 3)
labelnames <- c("good", "very good", "best")
#creating a factor vector
fqt <- factor(qt,labels = labelnames)
qt
 [1] 3 2 1 1 2 3 2 1 3 3
fqt
 [1] best      very good good      good      very good best      very good
 [8] good      best      best     
Levels: good very good best

Matrices

  • A matrix is a collection of data elements arranged in a two-dimensional rectangular layout.
  • all the elements in a matrix are of the same data type
  • Function matrix is used to create matrix
m1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 3, ncol = 2, byrow = TRUE)
# nrow- specify number of rows
# ncol-specify the number of columns
#byrow- fill the matrix in rows with the data supplied 
m1
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
m2 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 3, ncol = 2, byrow = FALSE)
m2
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Other functions to create matrices

  • Other ways to create matrix
  • By using functions dim, cbind and rbind
# Use of `dim` function
x <- 1:12
dim(x) <- c(3,4) 
x
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
# Use of `dim` function
dim(x) <- c(4,3)
x
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12
# Use of `rbind` function
# rows must have same length 
r1 <- 1:3
r2 <- c(4, 5, 6)
rm <- rbind(r1, r2)  
rm 
   [,1] [,2] [,3]
r1    1    2    3
r2    4    5    6
# Use of `cbind` function
# columns must have same length 
c1 <- 1:3
c2 <- c(10, 9, 8)
c3 <- c(20, 5, 15)
cm <- cbind(c1, c2, c3)  
cm 
     c1 c2 c3
[1,]  1 10 20
[2,]  2  9  5
[3,]  3  8 15

Names and indexing in Matrices

  • Column names and row names
  • Extracting a specific element(s)
cm 
     c1 c2 c3
[1,]  1 10 20
[2,]  2  9  5
[3,]  3  8 15
# Names of columns and rows
colnames(cm) <- c("Eco", "Hist", "Pol")

rownames(cm) <- c("SI", "SII", "SIII")
cm
     Eco Hist Pol
SI     1   10  20
SII    2    9   5
SIII   3    8  15
# Extracting `Pol` in `SIII`
cm[3,3]
[1] 15
  • Extract first row and all columns
  • Extract second column and all rows
  • Extract all rows, second and third columns
# Extract first row and all columns
cm[1,]
 Eco Hist  Pol 
   1   10   20 
# Extract second column and all rows
cm[,2]
  SI  SII SIII 
  10    9    8 
# Extract first row and second column
cm[,c(2,3)]
     Hist Pol
SI     10  20
SII     9   5
SIII    8  15

Matrix manipulations

  • all the mathematical functions available for vectors are applicable on a matrix
  • all operations are applied on each element in a matrix
  • Multiply the matrix m1 with 3
# ratings in numbers
m1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 3, ncol = 2, byrow = TRUE)
# Multiplying matrix m1 by 3
m3 <- m1*3
m1
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
m3
     [,1] [,2]
[1,]    3    6
[2,]    9   12
[3,]   15   18
  • Mathematical operations-for example matrix multiplication
  • Rule of dimensionality must satisfy
# ratings in numbers
m1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 3, ncol = 2, byrow = TRUE)
# Multiplying matrix m1 by 3
m1
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
dim(m1)
[1] 3 2
  • create another matrix with 2 rows and 3 columns
m4 <- matrix(c(1, 2, 3, 4, 5, 6), nrow= 2, ncol = 3, byrow = TRUE)
dim(m4)
[1] 2 3
# Multiplying matrix m1 with m4 
m5 <- m1%*%m4
m5
     [,1] [,2] [,3]
[1,]    9   12   15
[2,]   19   26   33
[3,]   29   40   51

Lists

  • In R a list is a generic collection of objects
  • Each component can have different types of data
  • mylist <- list(name1=component1, name2=component2, ...)
# generate a list of object

mylist <- list(A=seq(10, 30,5), student="aadil", idm=diag(3))

#print my list

mylist
$A
[1] 10 15 20 25 30

$student
[1] "aadil"

$idm
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
# generate a list of object

mylist <- list(A=seq(10, 30,5), student="aadil", idm=diag(3))

#print vector names
names(mylist)
[1] "A"       "student" "idm"    
# generate a list of object

mylist <- list(A=seq(10, 30,5), student="aadil", idm=diag(3))

#print component A
mylist$A
[1] 10 15 20 25 30

Data Frames

  • An object consisting of several variables
  • Rectangular shape
  • rows representing observational units
  • columns representing the variables
  • Is it matrix?
  • No
  • Matrix include only one type of data
  • Data frame can include different types of data
  • Use the command data.frame or as.data.frame to transform into data frame
# Creating a data frame

# Define a vector 
year <- c( 1991, 1992, 1993, 1994, 1995)

GDP <- c(200, 250, 300, 320, 400)

INV <- c(100, 150, 100, 120, 200)

macro_mat <- cbind(GDP, INV)

rownames(macro_mat) <- year

macro_mat
     GDP INV
1991 200 100
1992 250 150
1993 300 100
1994 320 120
1995 400 200
# create data frame

macro <- as.data.frame(macro_mat)
macro
     GDP INV
1991 200 100
1992 250 150
1993 300 100
1994 320 120
1995 400 200

Accessing Data frame

  • Accessing a single variable
  • Generating a new variable
# Existing data frame
macro
     GDP INV
1991 200 100
1992 250 150
1993 300 100
1994 320 120
1995 400 200
#Accessing the GDP data Only
macro$GDP
[1] 200 250 300 320 400
# Generating a new variable in the data frame
macro$lnGDP <- log(macro$GDP)

# Using `with` function
macro$lnINV <- with(macro,log(INV))

#Using `attach()` function
attach(macro)
macro$total <- GDP+INV
detach(macro)

# Results
macro
     GDP INV    lnGDP    lnINV total
1991 200 100 5.298317 4.605170   300
1992 250 150 5.521461 5.010635   400
1993 300 100 5.703782 4.605170   400
1994 320 120 5.768321 4.787492   440
1995 400 200 5.991465 5.298317   600

Creating a subset of data frame

#Existing data frame
macro
     GDP INV    lnGDP    lnINV total
1991 200 100 5.298317 4.605170   300
1992 250 150 5.521461 5.010635   400
1993 300 100 5.703782 4.605170   400
1994 320 120 5.768321 4.787492   440
1995 400 200 5.991465 5.298317   600
# Taking subset
subdata <- subset(macro, total>400)

subdata
     GDP INV    lnGDP    lnINV total
1994 320 120 5.768321 4.787492   440
1995 400 200 5.991465 5.298317   600

THANKS